Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
1.
PLOS Digit Health ; 3(3): e0000459, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38489347

RESUMO

BACKGROUND: Systemic inflammatory response syndrome (SIRS) and sepsis are the most common causes of in-hospital death. However, the characteristics associated with the improvement in the patient conditions during the ICU stay were not fully elucidated for each population as well as the possible differences between the two. GOAL: The aim of this study is to highlight the differences between the prognostic clinical features for the survival of patients diagnosed with SIRS and those of patients diagnosed with sepsis by using a multi-variable predictive modeling approach with a reduced set of easily available measurements collected at the admission to the intensive care unit (ICU). METHODS: Data were collected from 1,257 patients (816 non-sepsis SIRS and 441 sepsis) admitted to the ICU. We compared the performance of five machine learning models in predicting patient survival. Matthews correlation coefficient (MCC) was used to evaluate model performances and feature importance, and by applying Monte Carlo stratified Cross-Validation. RESULTS: Extreme Gradient Boosting (MCC = 0.489) and Logistic Regression (MCC = 0.533) achieved the highest results for SIRS and sepsis cohorts, respectively. In order of importance, APACHE II, mean platelet volume (MPV), eosinophil counts (EoC), and C-reactive protein (CRP) showed higher importance for predicting sepsis patient survival, whereas, SOFA, APACHE II, platelet counts (PLTC), and CRP obtained higher importance in the SIRS cohort. CONCLUSION: By using complete blood count parameters as predictors of ICU patient survival, machine learning models can accurately predict the survival of SIRS and sepsis ICU patients. Interestingly, feature importance highlights the role of CRP and APACHE II in both SIRS and sepsis populations. In addition, MPV and EoC are shown to be important features for the sepsis population only, whereas SOFA and PLTC have higher importance for SIRS patients.

2.
PeerJ Comput Sci ; 10: e1896, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38435625

RESUMO

Diabetes is a metabolic disorder that affects more than 420 million of people worldwide, and it is caused by the presence of a high level of sugar in blood for a long period. Diabetes can have serious long-term health consequences, such as cardiovascular diseases, strokes, chronic kidney diseases, foot ulcers, retinopathy, and others. Even if common, this disease is uneasy to spot, because it often comes with no symptoms. Especially for diabetes type 2, that happens mainly in the adults, knowing how long the diabetes has been present for a patient can have a strong impact on the treatment they can receive. This information, although pivotal, might be absent: for some patients, in fact, the year when they received the diabetes diagnosis might be well-known, but the year of the disease unset might be unknown. In this context, machine learning applied to electronic health records can be an effective tool to predict the past duration of diabetes for a patient. In this study, we applied a regression analysis based on several computational intelligence methods to a dataset of electronic health records of 73 patients with diabetes type 1 with 20 variables and another dataset of records of 400 patients of diabetes type 2 with 49 variables. Among the algorithms applied, Random Forests was able to outperform the other ones and to efficiently predict diabetes duration for both the cohorts, with the regression performances measured through the coefficient of determination R2. Afterwards, we applied the same method for feature ranking, and we detected the most relevant factors of the clinical records correlated with past diabetes duration: age, insulin intake, and body-mass index. Our study discoveries can have profound impact on clinical practice: when the information about the duration of diabetes of patient is missing, medical doctors can use our tool and focus on age, insulin intake, and body-mass index to infer this important aspect. Regarding limitations, unfortunately we were unable to find additional dataset of EHRs of patients with diabetes having the same variables of the two analyzed here, so we could not verify our findings on a validation cohort.

3.
J Healthc Inform Res ; 8(1): 1-18, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38273986

RESUMO

Glioblastoma multiforme (GM) is a malignant tumor of the central nervous system considered to be highly aggressive and often carrying a terrible survival prognosis. An accurate prognosis is therefore pivotal for deciding a good treatment plan for patients. In this context, computational intelligence applied to data of electronic health records (EHRs) of patients diagnosed with this disease can be useful to predict the patients' survival time. In this study, we evaluated different machine learning models to predict survival time in patients suffering from glioblastoma and further investigated which features were the most predictive for survival time. We applied our computational methods to three different independent open datasets of EHRs of patients with glioblastoma: the Shieh dataset of 84 patients, the Berendsen dataset of 647 patients, and the Lammer dataset of 60 patients. Our survival time prediction techniques obtained concordance index (C-index) = 0.583 in the Shieh dataset, C-index = 0.776 in the Berendsen dataset, and C-index = 0.64 in the Lammer dataset, as best results in each dataset. Since the original studies regarding the three datasets analyzed here did not provide insights about the most predictive clinical features for survival time, we investigated the feature importance among these datasets. To this end, we then utilized Random Survival Forests, which is a decision tree-based algorithm able to model non-linear interaction between different features and might be able to better capture the highly complex clinical and genetic status of these patients. Our discoveries can impact clinical practice, aiding clinicians and patients alike to decide which therapy plan is best suited for their unique clinical status.

4.
PLoS Comput Biol ; 19(12): e1011700, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38127800

RESUMO

Fuzzy logic is useful tool to describe and represent biological or medical scenarios, where often states and outcomes are not only completely true or completely false, but rather partially true or partially false. Despite its usefulness and spread, fuzzy logic modeling might easily be done in the wrong way, especially by beginners and unexperienced researchers, who might overlook some important aspects or might make common mistakes. Malpractices and pitfalls, in turn, can lead to wrong or overoptimistic, inflated results, with negative consequences to the biomedical research community trying to comprehend a particular phenomenon, or even to patients suffering from the investigated disease. To avoid common mistakes, we present here a list of quick tips for fuzzy logic modeling any biomedical scenario: some guidelines which should be taken into account by any fuzzy logic practitioner, including experts. We believe our best practices can have a strong impact in the scientific community, allowing researchers who follow them to obtain better, more reliable results and outcomes in biomedical contexts.


Assuntos
Lógica Fuzzy , Medicina , Humanos
5.
Eur J Cancer ; 193: 113291, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37708628

RESUMO

OBJECTIVE: Seek new candidate prognostic markers for neuroblastoma outcome, relapse or progression. MATERIALS AND METHODS: In this multicentre and retrospective study, Random Forests coupled with recursive feature elimination techniques were applied to electronic records (55 clinical features) of 3034 neuroblastoma patients. To assess model performance and feature importance, dataset was split into a training set (80%) and a test set (20%). RESULTS: In the test set, the mean Matthews correlation coefficient for the Random Forests models was greater than 0.46. Feature importance analysis revealed that, together with maximum response to first-line treatment (D_MAX_RESP), time to maximum response to first-line treatment (TIME_MAX_RESP.days) is a relevant predictor of both patients' outcome and relapse\progression. We showed the prognostic value of the max response to first-line treatment in clinically relevant subsets of high-, intermediate-, and low-risk patients for both overall and relapse-free survival (Log-rank p-value<0.0001). In high-risk patients older than 18 months and stage 4 tumour achieving a complete response or very good partial response, patients who exhibited a D_MAX_RESP greater than 9 months showed a better prognosis with respect to patients achieving D_MAX_RESP earlier than 9 months (overall survival): hazard ratio 3.3 95% confidence interval 1.8-5.9, Log-rank p-value p < 0.0001; relapse-free survival: 3.2 95%CI 1.8-5.6, Log-rank p-value p < 0.0001). CONCLUSION: Our findings evidence the emerging role of the TIME_MAX_RESP.days in addition to the D_MAX_RESP as relevant predictors of outcome and relapse\progression in neuroblastoma with potential clinical impact on the management and treatment of patients.

6.
PLoS Comput Biol ; 19(7): e1011224, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37410704

RESUMO

Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.


Assuntos
Genômica , Multiômica , Humanos , Proteômica , Biologia Computacional , Metabolômica
7.
PLoS Comput Biol ; 19(7): e1011272, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37471333

RESUMO

Some scientific studies involve huge amounts of bioinformatics data that cannot be analyzed on personal computers usually employed by researchers for day-to-day activities but rather necessitate effective computational infrastructures that can work in a distributed way. For this purpose, distributed computing systems have become useful tools to analyze large amounts of bioinformatics data and to generate relevant results on virtual environments, where software can be executed for hours or even days without affecting the personal computer or laptop of a researcher. Even if distributed computing resources have become pivotal in multiple bioinformatics laboratories, often researchers and students use them in the wrong ways, making mistakes that can cause the distributed computers to underperform or that can even generate wrong outcomes. In this context, we present here ten quick tips for the usage of Apache Spark distributed computing systems for bioinformatics analyses: ten simple guidelines that, if taken into account, can help users avoid common mistakes and can help them run their bioinformatics analyses smoothly. Even if we designed our recommendations for beginners and students, they should be followed by experts too. We think our quick tips can help anyone make use of Apache Spark distributed computing systems more efficiently and ultimately help generate better, more reliable scientific results.


Assuntos
Biologia Computacional , Software , Humanos , Biologia Computacional/métodos , Computadores , Redes de Comunicação de Computadores
8.
J Biomed Inform ; 144: 104426, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37352899

RESUMO

Even if assessing binary classifications is a common task in scientific research, no consensus on a single statistic summarizing the confusion matrix has been reached so far. In recent studies, we demonstrated the advantages of the Matthews correlation coefficient (MCC) over other popular rates such as cross-entropy error, F1 score, accuracy, balanced accuracy, bookmaker informedness, diagnostic odds ratio, Brier score, and Cohen's kappa. In this study, we compared the MCC to other two statistics: prevalence threshold (PT), frequently used in obstetrics and gynecology, and Fowlkes-Mallows index, a metric employed in fuzzy logic and drug discovery. Through the investigation of the mutual relations among three metrics and the study of some relevant use cases, we show that, when positive data elements and negative data elements have the same importance, the Matthews correlation coefficient can be more informative than its two competitors, even this time.


Assuntos
Algoritmos , Lógica Fuzzy , Prevalência , Descoberta de Drogas , Entropia
9.
BioData Min ; 16(1): 7, 2023 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-36870971

RESUMO

Neuroblastoma is a childhood neurological tumor which affects hundreds of thousands of children worldwide, and information about its prognosis can be pivotal for patients, their families, and clinicians. One of the main goals in the related bioinformatics analyses is to provide stable genetic signatures able to include genes whose expression levels can be effective to predict the prognosis of the patients. In this study, we collected the prognostic signatures for neuroblastoma published in the biomedical literature, and noticed that the most frequent genes present among them were three: AHCY, DPYLS3, and NME1. We therefore investigated the prognostic power of these three genes by performing a survival analysis and a binary classification on multiple gene expression datasets of different groups of patients diagnosed with neuroblastoma. Finally, we discussed the main studies in the literature associating these three genes with neuroblastoma. Our results, in each of these three steps of validation, confirm the prognostic capability of AHCY, DPYLS3, and NME1, and highlight their key role in neuroblastoma prognosis. Our results can have an impact on neuroblastoma genetics research: biologists and medical researchers can pay more attention to the regulation and expression of these three genes in patients having neuroblastoma, and therefore can develop better cures and treatments which can save patients' lives.

10.
BioData Min ; 16(1): 4, 2023 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-36800973

RESUMO

Binary classification is a common task for which machine learning and computational statistics are used, and the area under the receiver operating characteristic curve (ROC AUC) has become the common standard metric to evaluate binary classifications in most scientific fields. The ROC curve has true positive rate (also called sensitivity or recall) on the y axis and false positive rate on the x axis, and the ROC AUC can range from 0 (worst result) to 1 (perfect result). The ROC AUC, however, has several flaws and drawbacks. This score is generated including predictions that obtained insufficient sensitivity and specificity, and moreover it does not say anything about positive predictive value (also known as precision) nor negative predictive value (NPV) obtained by the classifier, therefore potentially generating inflated overoptimistic results. Since it is common to include ROC AUC alone without precision and negative predictive value, a researcher might erroneously conclude that their classification was successful. Furthermore, a given point in the ROC space does not identify a single confusion matrix nor a group of matrices sharing the same MCC value. Indeed, a given (sensitivity, specificity) pair can cover a broad MCC range, which casts doubts on the reliability of ROC AUC as a performance measure. In contrast, the Matthews correlation coefficient (MCC) generates a high score in its [Formula: see text] interval only if the classifier scored a high value for all the four basic rates of the confusion matrix: sensitivity, specificity, precision, and negative predictive value. A high MCC (for example, MCC [Formula: see text] 0.9), moreover, always corresponds to a high ROC AUC, and not vice versa. In this short study, we explain why the Matthews correlation coefficient should replace the ROC AUC as standard statistic in all the scientific studies involving a binary classification, in all scientific fields.

11.
BioData Min ; 16(1): 6, 2023 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-36823520

RESUMO

Bioinformatics has become a key aspect of the biomedical research programmes of many hospitals' scientific centres, and the establishment of bioinformatics facilities within hospitals has become a common practice worldwide. Bioinformaticians working in these facilities provide computational biology support to medical doctors and principal investigators who are daily dealing with data of patients to analyze. These bioinformatics analysts, although pivotal, usually do not receive formal training for this job. We therefore propose these ten simple rules to guide these bioinformaticians in their work: ten pieces of advice on how to provide bioinformatics support to medical doctors in hospitals. We believe these simple rules can help bioinformatics facility analysts in producing better scientific results and work in a serene and fruitful environment.

12.
PLoS Comput Biol ; 19(1): e1010778, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36602952

RESUMO

Medical imaging is a great asset for modern medicine, since it allows physicians to spatially interrogate a disease site, resulting in precise intervention for diagnosis and treatment, and to observe particular aspect of patients' conditions that otherwise would not be noticeable. Computational analysis of medical images, moreover, can allow the discovery of disease patterns and correlations among cohorts of patients with the same disease, thus suggesting common causes or providing useful information for better therapies and cures. Machine learning and deep learning applied to medical images, in particular, have produced new, unprecedented results that can pave the way to advanced frontiers of medical discoveries. While computational analysis of medical images has become easier, however, the possibility to make mistakes or generate inflated or misleading results has become easier, too, hindering reproducibility and deployment. In this article, we provide ten quick tips to perform computational analysis of medical images avoiding common mistakes and pitfalls that we noticed in multiple studies in the past. We believe our ten guidelines, if taken into practice, can help the computational-medical imaging community to perform better scientific research that eventually can have a positive impact on the lives of patients worldwide.


Assuntos
Aprendizado de Máquina , Humanos , Reprodutibilidade dos Testes
13.
PLoS Comput Biol ; 18(12): e1010718, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36520712

RESUMO

Applying computational statistics or machine learning methods to data is a key component of many scientific studies, in any field, but alone might not be sufficient to generate robust and reliable outcomes and results. Before applying any discovery method, preprocessing steps are necessary to prepare the data to the computational analysis. In this framework, data cleaning and feature engineering are key pillars of any scientific study involving data analysis and that should be adequately designed and performed since the first phases of the project. We call "feature" a variable describing a particular trait of a person or an observation, recorded usually as a column in a dataset. Even if pivotal, these data cleaning and feature engineering steps sometimes are done poorly or inefficiently, especially by beginners and unexperienced researchers. For this reason, we propose here our quick tips for data cleaning and feature engineering on how to carry out these important preprocessing steps correctly avoiding common mistakes and pitfalls. Although we designed these guidelines with bioinformatics and health informatics scenarios in mind, we believe they can more in general be applied to any scientific area. We therefore target these guidelines to any researcher or practitioners wanting to perform data cleaning or feature engineering. We believe our simple recommendations can help researchers and scholars perform better computational analyses that can lead, in turn, to more solid outcomes and more reliable discoveries.


Assuntos
Biologia Computacional , Aprendizado de Máquina , Humanos , Biologia Computacional/métodos , Engenharia
14.
BioData Min ; 15(1): 28, 2022 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-36329531

RESUMO

Cancer is one of the leading causes of death worldwide and can be caused by environmental aspects (for example, exposure to asbestos), by human behavior (such as smoking), or by genetic factors. To understand which genes might be involved in patients' survival, researchers have invented prognostic genetic signatures: lists of genes that can be used in scientific analyses to predict if a patient will survive or not. In this study, we joined together five different prognostic signatures, each of them related to a specific cancer type, to generate a unique pan-cancer prognostic signature, that contains 207 unique probesets related to 187 unique gene symbols, with one particular probeset present in two cancer type-specific signatures (203072_at related to the MYO1E gene). We applied our proposed pan-cancer signature with the Random Forests machine learning method to 57 microarray gene expression datasets of 12 different cancer types, and analyzed the results. We also compared the performance of our pan-cancer signature with the performances of two alternative prognostic signatures, and with the performances of each cancer type-specific signature on their corresponding cancer type-specific datasets. Our results confirmed the effectiveness of our prognostic pan-cancer signature. Moreover, we performed a pathway enrichment analysis, which indicated an association between the signature genes and a protein-protein interaction analysis, that highlighted PIK3R2 and FN1 as key genes having a fundamental relevance in our signature, suggesting an important role in pan-cancer prognosis for both of them.

15.
Front Bioinform ; 2: 968327, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36388843

RESUMO

Functional enrichment analysis or pathway enrichment analysis (PEA) is a bioinformatics technique which identifies the most over-represented biological pathways in a list of genes compared to those that would be associated with them by chance. These biological functions are found on bioinformatics annotated databases such as The Gene Ontology or KEGG; the more abundant pathways are identified through statistical techniques such as Fisher's exact test. All PEA tools require a list of genes as input. A few tools, however, read lists of genomic regions as input rather than lists of genes, and first associate these chromosome regions with their corresponding genes. These tools perform a procedure called genomic regions enrichment analysis, which can be useful for detecting the biological pathways related to a set of chromosome regions. In this brief survey, we analyze six tools for genomic regions enrichment analysis (BEHST, g:Profiler g:GOSt, GREAT, LOLA, Poly-Enrich, and ReactomePA), outlining and comparing their main features. Our comparison results indicate that the inclusion of data for regulatory elements, such as ChIP-seq, is common among these tools and could therefore improve the enrichment analysis results.

17.
PLoS Comput Biol ; 18(8): e1010348, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35951505

RESUMO

Pathway enrichment analysis (PEA) is a computational biology method that identifies biological functions that are overrepresented in a group of genes more than would be expected by chance and ranks these functions by relevance. The relative abundance of genes pertinent to specific pathways is measured through statistical methods, and associated functional pathways are retrieved from online bioinformatics databases. In the last decade, along with the spread of the internet, higher availability of computational resources made PEA software tools easy to access and to use for bioinformatics practitioners worldwide. Although it became easier to use these tools, it also became easier to make mistakes that could generate inflated or misleading results, especially for beginners and inexperienced computational biologists. With this article, we propose nine quick tips to avoid common mistakes and to out a complete, sound, thorough PEA, which can produce relevant and robust results. We describe our nine guidelines in a simple way, so that they can be understood and used by anyone, including students and beginners. Some tips explain what to do before starting a PEA, others are suggestions of how to correctly generate meaningful results, and some final guidelines indicate some useful steps to properly interpret PEA results. Our nine tips can help users perform better pathway enrichment analyses and eventually contribute to a better understanding of current biology.


Assuntos
Biologia Computacional , Software , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos
18.
PLoS Comput Biol ; 18(8): e1010395, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-36006874

RESUMO

Special sessions are important parts of scientific meetings and conferences: They gather together researchers and students interested in a specific topic and can strongly contribute to the success of the conference itself. Moreover, they can be the first step for trainees and students to the organization of a scientific event. Organizing a special session, however, can be uneasy for beginners and students. Here, we provide ten simple rules to follow to organize a special session at a scientific conference.


Assuntos
Pesquisadores , Estudantes , Humanos
20.
Methods Mol Biol ; 2401: 187-194, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34902129

RESUMO

Gene expression profiling is a useful way to measure the activity of genes in molecular biology and, because of its effectiveness, researchers have released thousands of gene expression datasets publicly in online databases and repositories, such as Gene Expression Omnibus (GEO). To read and analyze gene expression data, the computational biology community has developed several tools and platforms, including Bioconductor, an R open-source platform of software packages that can be used to analyze these data. Despite the usefulness of Bioconductor and of its packages, it is still difficult to read gene expression data from GEO, and to assign gene symbols to the probesets of datasets. To alleviate this problem, we introduce here a new R software package, geneExpressionFromGEO, which provides to the users the possibility to easily download gene expression data from GEO and to easily associate gene symbols to probesets. In this short chapter, we describe the assets of our software package, and we report an example of its usage. We believe that geneExpressionFromGEO can be very useful for the R community of bioinformaticians working on gene expression data.


Assuntos
Expressão Gênica , Biologia Computacional , Perfilação da Expressão Gênica , Leitura , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA